Analysis and Construction of Noun Hypernym Hierarchies to Enhance Roget’s Thesaurus

نویسندگان

  • Alistair Kennedy
  • Stan Szpakowicz
  • Darren Kipp
چکیده

Lexical resources are machine-readable dictionaries or lists of words, where semantic relationships between the terms are somehow expressed. These lexical resources have been used for many tasks such as word sense disambiguation and determining semantic similarity between terms. In recent years some research has been put into automatically building lexical resources from large corpora. In this thesis I examine methods of constructing a lexical resource, not from scratch, but rather expanding existing ones. Roget’s Thesaurus is a lexical resource that groups terms together based on degrees of semantic relatedness. One of Rogets Thesaurus’ weaknesses is that it does not specify the nature of the relationships between terms, it only indicates that there is a relationship. I attempt to label the relationships between terms in the thesaurus. These relationships could include: synonymy, hyponymy/hypernymy and meronymy/holonymy. I examine the Thesaurus for all of these relationships. Sources of these relationships include other lexical resources such as WordNet, and also large corpora and specialized texts such as dictionaries. Roget’s Thesaurus has other weaknesses including a somewhat outdated lexicon. Our version of Roget’s Thesaurus was created in 1987 and so does not contain words/phrases related to the Internet and other advances since 1987. I examine methods of creating a hypernym hierarchy of nouns. A hierarchy is constructed automatically and evaluated manually by several annotators who are fluent in English. These hypernyms are intended to be used in a system where a human annotator is given a set of hypernyms and indicates which are correct and which are incorrect. This is done to facilitate the process of constructing a lexical resource, a process which was previously done manually. I import over 50,000 hypernym relationships to Roget’s Thesaurus. An estimated overall accuracy of 73% is achieved across the entire hypernym set. As a final test the new relationships imported to the Thesaurus are used to improve Roget’s Thesaurus capacity of calculating semantic similarity between terms/phrases. The improved similarity function is tested on several applications that make use of semantic similarity. The relationships are also used to improve Roget’s Thesaurus’ capacity for solving SAT style analogy questions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Disambiguating Hypernym Relations for Roget's Thesaurus

Roget’s Thesaurus is a lexical resource which groups terms by semantic relatedness. It is Roget’s shortcoming that the relations are ambiguous, in that it does not name them; it only shows that there is a relation between terms. Our work focuses on disambiguating hypernym relations within Roget’s Thesaurus. Several techniques of identifying hypernym relations are compared and contrasted in this...

متن کامل

Roget's thesaurus and semantic similarity

Roget’s Thesaurus has not been sufficiently appreciated in Natural Language Processing. We show that Roget's and WordNet are birds of a feather. In a few typical tests, we compare how the two resources help measure semantic similarity. One of the benchmarks is Miller and Charles’ list of 30 noun pairs to which human judges had assigned similarity measures. We correlate these measures with those...

متن کامل

Not as Easy as It Seems: Automating the Construction of Lexical Chains Using Roget's Thesaurus

Morris and Hirst [10] present a method of linking significant words that are about the same topic. The resulting lexical chains are a means of identifying cohesive regions in a text, with applications in many natural language processing tasks, including text summarization. The first lexical chains were constructed manually using Roget’s International Thesaurus. Morris and Hirst wrote that autom...

متن کامل

The Semantic Structure of Roget’s Thesaurus Cross-References

This study analyzed a database version of Roget’s Thesaurus (Roget’s International Thesaurus, 3rd Edition, 1962) for connectivity patterns among cross-references in order to identify the implicit conceptual structure. Semantic patterns implicit in the data, at both the local and global levels of the Thesaurus structure, are identified.

متن کامل

Evaluating Roget's Thesauri

Roget’s Thesaurus has gone through many revisions since it was first published 150 years ago. But how do these revisions affect Roget’s usefulness for NLP? We examine the differences in content between the 1911 and 1987 versions of Roget’s, and we test both versions with each other and WordNet on problems such as synonym identification and word relatedness. We also present a novel method for me...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007